Note: Increased IOPub

For visualization, if you did not start this notebook with an increased data rate limit, stop the notebook, go back to the command line, and start Jupyter Notebook using the following command.

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

Import the Python API module and Instantiate the GIS object

Import the Python API



In [1]:

    
import arcgis
from getpass import getpass
import pandas as pd









    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-d5c8cc5010e9> in <module>()
----> 1 import arcgis
      2 from getpass import getpass
      3 import pandas as pd

ImportError: No module named 'arcgis'

Create an GIS object instance using the account currently logged in through ArcGIS Pro



In [2]:

    
# gis_retail = arcgis.gis.GIS('Pro')  # optional, use if can use ArcGIS Pro authentication
gis = arcgis.gis.GIS(
    username='jmccune_retail', 
    password=getpass('Please enter the headless password: ')
)









    



Please enter the headless password: ········

Get a Data Frame, data to work with, from the Web GIS Item ID

Create a Web GIS Item instance using the Item ID



In [3]:

    
trade_area_itemid = '44c0b666404c41b8a14e04c34d09be1e'
item = arcgis.gis.Item(gis=gis, itemid=trade_area_itemid)
item









    Out[3]:





                    
                       
                        
                       
                    

                    
                        targetWithDemographics
                        
                        
All same stores with huge variety of demographics associated with the locations.Feature Layer Collection by joel5174@esri.com_commteamretail
                        
Last Modified: August 15, 2017
                        
0 comments, 15 views

Since the item only contains one feature layer, get the first layer in the item, the Feature Layer we need to work with.



In [4]:

    
feature_layer = item.layers[0]
feature_layer









    Out[4]:





<FeatureLayer url:"https://services.arcgis.com/PMTtzuTB6WiPuNSv/arcgis/rest/services/targetWithDemographics/FeatureServer/0">

Use query to return the data as a Feature Set.



In [5]:

    
feature_set = feature_layer.query()



In [6]:

    
df_fields = pd.DataFrame([(field['name'], field['alias']) for field in feature_set.fields], columns=['NAME', 'ALIAS'])
df_fields









    Out[6]:







  
    
      
      NAME
      ALIAS
    
  
  
    
      0
      OBJECTID
      Object ID
    
    
      1
      Target_Locations_LOCNUM
      LOCNUM
    
    
      2
      Target_Locations_CONAME
      CONAME
    
    
      3
      Target_Locations_STREET
      STREET
    
    
      4
      Target_Locations_CITY
      CITY
    
    
      5
      Target_Locations_STATE
      STATE
    
    
      6
      Target_Locations_ZIP
      ZIP
    
    
      7
      Target_Locations_ZIP4
      ZIP4
    
    
      8
      Target_Locations_NAICS
      NAICS
    
    
      9
      Target_Locations_SIC
      SIC
    
    
      10
      Target_Locations_SALESVOL
      SALESVOL
    
    
      11
      Target_Locations_SQFTCODE
      SQFTCODE
    
    
      12
      overlayTemplate_TOTPOP_CY
      2016 Total Population
    
    
      13
      overlayTemplate_HHPOP_CY
      2016 Population in Households
    
    
      14
      overlayTemplate_FAMPOP_CY
      2016 Population in Families
    
    
      15
      overlayTemplate_GQPOP_CY
      2016 Population in Group Quarters
    
    
      16
      overlayTemplate_POPDENS_CY
      2016 Population Density
    
    
      17
      overlayTemplate_TOTHH_CY
      2016 Total Households
    
    
      18
      overlayTemplate_AVGHHSZ_CY
      2016 Average Household Size
    
    
      19
      overlayTemplate_FAMHH_CY
      2016 Family Households
    
    
      20
      overlayTemplate_AVGFMSZ_CY
      2016 Average Family Size
    
    
      21
      overlayTemplate_TOTHU_CY
      2016 Total Housing Units
    
    
      22
      overlayTemplate_OWNER_CY
      2016 Owner Occupied HUs
    
    
      23
      overlayTemplate_RENTER_CY
      2016 Renter Occupied HUs
    
    
      24
      overlayTemplate_VACANT_CY
      2016 Vacant Housing Units
    
    
      25
      overlayTemplate_POPGRW10CY
      2010-2016 Growth Rate: Population
    
    
      26
      overlayTemplate_HHGRW10CY
      2010-2016 Growth Rate: Households
    
    
      27
      overlayTemplate_FAMGRW10CY
      2010-2016 Growth Rate: Families
    
    
      28
      overlayTemplate_NOHS_CY
      2016 Education: < 9th Grade
    
    
      29
      overlayTemplate_SOMEHS_CY
      2016 Education: High School/No Diploma
    
    
      ...
      ...
      ...
    
    
      130
      overlayTemplate_INDEDUC_CY
      2016 Industry: Educational Services
    
    
      131
      overlayTemplate_INDHLTH_CY
      2016 Industry: Health Care
    
    
      132
      overlayTemplate_INDARTS_CY
      2016 Industry: Arts/Entertainment/Rec
    
    
      133
      overlayTemplate_INDFOOD_CY
      2016 Industry: Accommodation/Food Svcs
    
    
      134
      overlayTemplate_INDOTSV_CY
      2016 Industry: Other Services
    
    
      135
      overlayTemplate_INDPUBL_CY
      2016 Industry: Public Administration
    
    
      136
      overlayTemplate_UNEMP_CY
      2016 Unemployed Population 16+
    
    
      137
      overlayTemplate_UNEMPRT_CY
      2016 Unemployment Rate
    
    
      138
      overlayTemplate_OCCMGMT_CY
      2016 Occupation: Management
    
    
      139
      overlayTemplate_OCCBUS_CY
      2016 Occupation: Business/Financial
    
    
      140
      overlayTemplate_OCCCOMP_CY
      2016 Occupation: Computer/Mathmatical
    
    
      141
      overlayTemplate_OCCARCH_CY
      2016 Occupation: Architecture/Engineer
    
    
      142
      overlayTemplate_OCCSSCI_CY
      2016 Occupation: Life/Social Sciences
    
    
      143
      overlayTemplate_OCCSSRV_CY
      2016 Occupation: Social Service
    
    
      144
      overlayTemplate_OCCLEGL_CY
      2016 Occupation: Legal
    
    
      145
      overlayTemplate_OCCEDUC_CY
      2016 Occupation: Education/Library
    
    
      146
      overlayTemplate_OCCENT_CY
      2016 Occupation: Arts/Entertainment/Rec
    
    
      147
      overlayTemplate_OCCHTCH_CY
      2016 Occupation: Health Practices
    
    
      148
      overlayTemplate_OCCHLTH_CY
      2016 Occupation: Health Support
    
    
      149
      overlayTemplate_OCCPROT_CY
      2016 Occupation: Protective Service
    
    
      150
      overlayTemplate_OCCFOOD_CY
      2016 Occupation: Food Preperation
    
    
      151
      overlayTemplate_OCCBLDG_CY
      2016 Occupation: Building Maintenance
    
    
      152
      overlayTemplate_OCCPERS_CY
      2016 Occupation: Personal Care
    
    
      153
      overlayTemplate_OCCSALE_CY
      2016 Occupation: Sales
    
    
      154
      overlayTemplate_OCCADMN_CY
      2016 Occupation: Office/Admin
    
    
      155
      overlayTemplate_OCCFARM_CY
      2016 Occupation: Farm/Fish/Forestry
    
    
      156
      overlayTemplate_OCCCONS_CY
      2016 Occupation: Construction/Extraction
    
    
      157
      overlayTemplate_OCCMAIN_CY
      2016 Occupation: Maintenance/Repair
    
    
      158
      overlayTemplate_OCCPROD_CY
      2016 Occupation: Production
    
    
      159
      overlayTemplate_OCCTRAN_CY
      2016 Occupation: Transportation/Moving
    
  

160 rows × 2 columns

Take advantage of the df function on the Feature set object returned from the query to convert the data to a Pandas Data Frame.



In [7]:

    
df = feature_set.df
df.head()









    Out[7]:







  
    
      
      OBJECTID
      Target_Locations_CITY
      Target_Locations_CONAME
      Target_Locations_LOCNUM
      Target_Locations_NAICS
      Target_Locations_SALESVOL
      Target_Locations_SIC
      Target_Locations_SQFTCODE
      Target_Locations_STATE
      Target_Locations_STREET
      ...
      overlayTemplate_VAL200K_CY
      overlayTemplate_VAL250K_CY
      overlayTemplate_VAL300K_CY
      overlayTemplate_VAL400K_CY
      overlayTemplate_VAL500K_CY
      overlayTemplate_VAL50K_CY
      overlayTemplate_VAL750K_CY
      overlayTemplate_WHITE_CY
      overlayTemplate_WIDOWED_CY
      SHAPE
    
  
  
    
      0
      1
      SEASIDE
      TARGET
      666990510
      45211101
      35495
      531102
      D
      CA
      CALIFORNIA AVE
      ...
      200
      354
      934
      693
      1976
      81
      1012
      23576
      1902
      {'x': -121.84299999999993, 'y': 36.62100000000...
    
    
      1
      2
      MARINA
      TARGET
      653371815
      45211101
      35495
      531102
      D
      CA
      GENERAL STILLWELL DR
      ...
      137
      270
      1043
      742
      1612
      81
      588
      21068
      1500
      {'x': -121.81120000000001, 'y': 36.66760000000...
    
    
      2
      3
      CAPITOLA
      TARGET
      423468472
      45211101
      35495
      531102
      D
      CA
      41ST AVE
      ...
      372
      405
      1015
      1271
      4018
      261
      2452
      41050
      2524
      {'x': -121.96509999999991, 'y': 36.9753000000001}
    
    
      3
      4
      WATSONVILLE
      TARGET
      511743478
      45211101
      35495
      531102
      D
      CA
      MAIN ST
      ...
      456
      449
      870
      982
      1211
      197
      293
      19905
      1323
      {'x': -121.77399999999992, 'y': 36.91540000000...
    
    
      4
      5
      CUPERTINO
      TARGET
      404459478
      45211101
      52059
      531102
      D
      CA
      STEVENS CREEK BLVD
      ...
      43
      49
      211
      519
      2823
      110
      7694
      41689
      5226
      {'x': -122.03619999999995, 'y': 37.32310000000...
    
  

5 rows × 161 columns

Because I am neurotic, and hate the prefixes added to the field names from the join when preparing the data in ArcMap, we are cleaning them up a little here.



In [8]:

    
df.columns = [field.replace('Target_Locations_', '').replace('overlayTemplate_', '') for field in df.columns]
df.head()









    Out[8]:







  
    
      
      OBJECTID
      CITY
      CONAME
      LOCNUM
      NAICS
      SALESVOL
      SIC
      SQFTCODE
      STATE
      STREET
      ...
      VAL200K_CY
      VAL250K_CY
      VAL300K_CY
      VAL400K_CY
      VAL500K_CY
      VAL50K_CY
      VAL750K_CY
      WHITE_CY
      WIDOWED_CY
      SHAPE
    
  
  
    
      0
      1
      SEASIDE
      TARGET
      666990510
      45211101
      35495
      531102
      D
      CA
      CALIFORNIA AVE
      ...
      200
      354
      934
      693
      1976
      81
      1012
      23576
      1902
      {'x': -121.84299999999993, 'y': 36.62100000000...
    
    
      1
      2
      MARINA
      TARGET
      653371815
      45211101
      35495
      531102
      D
      CA
      GENERAL STILLWELL DR
      ...
      137
      270
      1043
      742
      1612
      81
      588
      21068
      1500
      {'x': -121.81120000000001, 'y': 36.66760000000...
    
    
      2
      3
      CAPITOLA
      TARGET
      423468472
      45211101
      35495
      531102
      D
      CA
      41ST AVE
      ...
      372
      405
      1015
      1271
      4018
      261
      2452
      41050
      2524
      {'x': -121.96509999999991, 'y': 36.9753000000001}
    
    
      3
      4
      WATSONVILLE
      TARGET
      511743478
      45211101
      35495
      531102
      D
      CA
      MAIN ST
      ...
      456
      449
      870
      982
      1211
      197
      293
      19905
      1323
      {'x': -121.77399999999992, 'y': 36.91540000000...
    
    
      4
      5
      CUPERTINO
      TARGET
      404459478
      45211101
      52059
      531102
      D
      CA
      STEVENS CREEK BLVD
      ...
      43
      49
      211
      519
      2823
      110
      7694
      41689
      5226
      {'x': -122.03619999999995, 'y': 37.32310000000...
    
  

5 rows × 161 columns

Store Segmentation Using KMeans Clustering

Use KMeans cluster analysis from the Sci-Kit Learn package to segment the demographically similar stores together.



In [9]:

    
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np

%matplotlib inline

Create the KMeans clusterer, specifying it to create five clusters, or segments.



In [10]:

    
km = KMeans(n_clusters=5)

Use the fit_predict method to create the clusters using just the numeric fields - not including the OID and sales fields.



In [11]:

    
df_independent = df.select_dtypes(include=[np.number]).drop(['OBJECTID', 'SALESVOL'], axis=1)
fields_independent = df_independent.columns  # save for later
field_name_dependent = 'SALESVOL' # keep track of this as well



In [12]:

    
df['segment'] = km.fit_predict(df_independent)



In [13]:

    
df.head()









    Out[13]:







  
    
      
      OBJECTID
      CITY
      CONAME
      LOCNUM
      NAICS
      SALESVOL
      SIC
      SQFTCODE
      STATE
      STREET
      ...
      VAL250K_CY
      VAL300K_CY
      VAL400K_CY
      VAL500K_CY
      VAL50K_CY
      VAL750K_CY
      WHITE_CY
      WIDOWED_CY
      SHAPE
      segment
    
  
  
    
      0
      1
      SEASIDE
      TARGET
      666990510
      45211101
      35495
      531102
      D
      CA
      CALIFORNIA AVE
      ...
      354
      934
      693
      1976
      81
      1012
      23576
      1902
      {'x': -121.84299999999993, 'y': 36.62100000000...
      0
    
    
      1
      2
      MARINA
      TARGET
      653371815
      45211101
      35495
      531102
      D
      CA
      GENERAL STILLWELL DR
      ...
      270
      1043
      742
      1612
      81
      588
      21068
      1500
      {'x': -121.81120000000001, 'y': 36.66760000000...
      0
    
    
      2
      3
      CAPITOLA
      TARGET
      423468472
      45211101
      35495
      531102
      D
      CA
      41ST AVE
      ...
      405
      1015
      1271
      4018
      261
      2452
      41050
      2524
      {'x': -121.96509999999991, 'y': 36.9753000000001}
      4
    
    
      3
      4
      WATSONVILLE
      TARGET
      511743478
      45211101
      35495
      531102
      D
      CA
      MAIN ST
      ...
      449
      870
      982
      1211
      197
      293
      19905
      1323
      {'x': -121.77399999999992, 'y': 36.91540000000...
      0
    
    
      4
      5
      CUPERTINO
      TARGET
      404459478
      45211101
      52059
      531102
      D
      CA
      STEVENS CREEK BLVD
      ...
      49
      211
      519
      2823
      110
      7694
      41689
      5226
      {'x': -122.03619999999995, 'y': 37.32310000000...
      3
    
  

5 rows × 162 columns

Investigate the Results

Now, take a look at a few descriptive metrics to begin to understand the results - initially by creating a data frame with the segments, mean sales for each segment, median sales for each segment, and the median sales for each segment.



In [14]:

    
df_count = df.segment.value_counts().reset_index().rename(columns={'index': 'segment', 'segment': 'count'})
df_mean = df.groupby('segment').mean()[field_name_dependent].reset_index().rename(
    columns={'segment': 'segment', field_name_dependent: 'mean_sales'})
df_median = df.groupby('segment').median()[field_name_dependent].reset_index().rename(
    columns={'segment': 'segment', field_name_dependent: 'median_sales'})



In [15]:

    
df_summary = pd.merge(df_count, df_mean)
df_summary = pd.merge(df_summary, df_median)
df_summary = df_summary.sort_values('mean_sales', ascending=False)
df_summary['mean_sales_zscore'] = df_summary.apply(lambda row: (row.mean_sales - df.SALESVOL.mean()) / df.SALESVOL.std(ddof=0), axis=1)
print('Sales Mean: {:.2f}\nSales Standard Deviation: {:.2f}'.format(df.SALESVOL.mean(), df.SALESVOL.std()))
df_summary









    



Sales Mean: 35147.10
Sales Standard Deviation: 17891.99






    Out[15]:







  
    
      
      segment
      count
      mean_sales
      median_sales
      mean_sales_zscore
    
  
  
    
      4
      3
      52
      45732.500000
      42130.5
      0.591792
    
    
      3
      1
      208
      40657.115385
      35495.0
      0.308045
    
    
      2
      2
      360
      38065.166667
      35495.0
      0.163139
    
    
      1
      4
      536
      34590.748134
      33751.0
      -0.031103
    
    
      0
      0
      652
      31391.226994
      32268.0
      -0.209977

Now, using data from this data frame, visualize the means of each segment against the mean, and one-half standard deviation above and below the mean.



In [18]:

    
ax = df_summary.sort_values('mean_sales').plot.bar(y='mean_sales', x='segment', figsize=(15, 8))
ax.set_ylabel('mean sales')
ax.set_title('Mean Sales by KMeans Segment')

ax.axhline(y=df.SALESVOL.mean(), color='r') # add the mean for reference
ax.axhline(y=df.SALESVOL.mean() - df.SALESVOL.std() * 0.5, color='g')
ax.axhline(y=df.SALESVOL.mean() + df.SALESVOL.std() * 0.5, color='g')









    Out[18]:





<matplotlib.lines.Line2D at 0x111da9438>



In [21]:

    
df['segment_3'] = df.segment == 3
df









    Out[21]:







  
    
      
      OBJECTID
      CITY
      CONAME
      LOCNUM
      NAICS
      SALESVOL
      SIC
      SQFTCODE
      STATE
      STREET
      ...
      VAL300K_CY
      VAL400K_CY
      VAL500K_CY
      VAL50K_CY
      VAL750K_CY
      WHITE_CY
      WIDOWED_CY
      SHAPE
      segment
      segment_3
    
  
  
    
      0
      1
      SEASIDE
      TARGET
      666990510
      45211101
      35495
      531102
      D
      CA
      CALIFORNIA AVE
      ...
      934
      693
      1976
      81
      1012
      23576
      1902
      {'x': -121.84299999999993, 'y': 36.62100000000...
      0
      False
    
    
      1
      2
      MARINA
      TARGET
      653371815
      45211101
      35495
      531102
      D
      CA
      GENERAL STILLWELL DR
      ...
      1043
      742
      1612
      81
      588
      21068
      1500
      {'x': -121.81120000000001, 'y': 36.66760000000...
      0
      False
    
    
      2
      3
      CAPITOLA
      TARGET
      423468472
      45211101
      35495
      531102
      D
      CA
      41ST AVE
      ...
      1015
      1271
      4018
      261
      2452
      41050
      2524
      {'x': -121.96509999999991, 'y': 36.9753000000001}
      4
      False
    
    
      3
      4
      WATSONVILLE
      TARGET
      511743478
      45211101
      35495
      531102
      D
      CA
      MAIN ST
      ...
      870
      982
      1211
      197
      293
      19905
      1323
      {'x': -121.77399999999992, 'y': 36.91540000000...
      0
      False
    
    
      4
      5
      CUPERTINO
      TARGET
      404459478
      45211101
      52059
      531102
      D
      CA
      STEVENS CREEK BLVD
      ...
      211
      519
      2823
      110
      7694
      41689
      5226
      {'x': -122.03619999999995, 'y': 37.32310000000...
      3
      True
    
    
      5
      6
      SAN JOSE
      TARGET
      373128867
      45211101
      84715
      531102
      D
      CA
      SARATOGA AVE
      ...
      408
      1002
      7996
      136
      10636
      70131
      6289
      {'x': -121.99069999999993, 'y': 37.29280000000...
      3
      True
    
    
      6
      7
      SUNNYVALE
      TARGET
      402344537
      45211101
      35495
      531102
      D
      CA
      W MC KINLEY AVE
      ...
      874
      1620
      6575
      351
      7622
      54321
      4554
      {'x': -122.03229999999998, 'y': 37.37370000000...
      3
      True
    
    
      7
      8
      SAN JOSE
      TARGET
      637354200
      45211101
      35495
      531102
      D
      CA
      BLOSSOM HILL RD
      ...
      2347
      4110
      16914
      847
      8664
      95636
      6032
      {'x': -121.86139999999993, 'y': 37.25050000000...
      3
      True
    
    
      8
      9
      SAN JOSE
      TARGET
      435039879
      45211101
      35495
      531102
      D
      CA
      COTTLE RD
      ...
      2517
      4531
      12939
      862
      2895
      60248
      3932
      {'x': -121.80389999999996, 'y': 37.24990000000...
      1
      False
    
    
      9
      10
      SAN JOSE
      TARGET
      230021602
      45211101
      70990
      531102
      D
      CA
      HILLSDALE AVE
      ...
      994
      2572
      15838
      284
      11823
      105854
      6740
      {'x': -121.91809999999997, 'y': 37.26320000000...
      3
      True
    
    
      10
      11
      SANTA CLARA
      TARGET
      706082503
      45211101
      35495
      531102
      D
      CA
      EL CAMINO REAL
      ...
      1024
      2219
      11352
      143
      5843
      64144
      4600
      {'x': -121.96059999999993, 'y': 37.35230000000...
      1
      False
    
    
      11
      12
      SAN JOSE
      TARGET
      427515311
      45211101
      35495
      531102
      D
      CA
      COLEMAN AVE
      ...
      1530
      2638
      6295
      118
      2191
      57534
      3499
      {'x': -121.9045, 'y': 37.339100000000066}
      4
      False
    
    
      12
      13
      SAN JOSE
      TARGET
      653371856
      45211101
      35495
      531102
      D
      CA
      MONTEREY HWY
      ...
      2668
      2699
      6629
      935
      3005
      54621
      4656
      {'x': -121.86229999999993, 'y': 37.30460000000...
      2
      False
    
    
      13
      14
      SAN JOSE
      TARGET
      230020265
      45211101
      59158
      531102
      D
      CA
      SILVER CREEK RD
      ...
      3609
      4905
      9288
      951
      2819
      45850
      6138
      {'x': -121.81259999999997, 'y': 37.30770000000...
      1
      False
    
    
      14
      15
      SAN JOSE
      TARGET
      608450706
      45211101
      35495
      531102
      D
      CA
      STORY RD
      ...
      4372
      3769
      6149
      528
      1173
      57430
      6657
      {'x': -121.84059999999995, 'y': 37.33980000000...
      4
      False
    
    
      15
      16
      SAN JOSE
      TARGET
      224908368
      45211101
      47327
      531102
      D
      CA
      N CAPITOL AVE
      ...
      3988
      4141
      9327
      303
      2652
      49144
      6969
      {'x': -121.84499999999997, 'y': 37.37430000000...
      2
      False
    
    
      16
      17
      PETALUMA
      TARGET
      434053366
      45211101
      35495
      531102
      D
      CA
      KENILWORTH DR
      ...
      2108
      2737
      4224
      140
      1045
      40488
      2441
      {'x': -122.62839999999994, 'y': 38.24430000000...
      4
      False
    
    
      17
      18
      ROHNERT PARK
      TARGET
      493062277
      45211101
      35495
      531102
      D
      CA
      ROHNERT PARK EXPY W
      ...
      1549
      1277
      1716
      578
      420
      32034
      2041
      {'x': -122.72029999999997, 'y': 38.34690000000...
      0
      False
    
    
      18
      19
      SANTA ROSA
      TARGET
      963164819
      45211101
      40228
      531102
      D
      CA
      SANTA ROSA AVE
      ...
      3734
      2363
      2468
      407
      318
      59232
      3254
      {'x': -122.71349999999993, 'y': 38.41910000000...
      4
      False
    
    
      19
      20
      SAN BRUNO
      TARGET
      988513198
      45211101
      118317
      531102
      D
      CA
      EL CAMINO REAL
      ...
      1276
      1917
      9866
      129
      5783
      43606
      6359
      {'x': -122.41899999999995, 'y': 37.63690000000...
      1
      False
    
    
      20
      21
      REDWOOD CITY
      TARGET
      230176208
      45211101
      47327
      531102
      D
      CA
      EL CAMINO REAL
      ...
      349
      596
      3232
      58
      3640
      47312
      3054
      {'x': -122.2177, 'y': 37.4753000000001}
      2
      False
    
    
      21
      22
      FOSTER CITY
      TARGET
      974408171
      45211101
      47327
      531102
      C
      CA
      BRIDGEPOINTE PKWY
      ...
      539
      1035
      6319
      77
      5222
      42560
      3881
      {'x': -122.28379999999994, 'y': 37.56030000000...
      1
      False
    
    
      22
      23
      DALY CITY
      TARGET
      212308704
      45211101
      106486
      531102
      D
      CA
      SERRAMONTE CTR
      ...
      1865
      3843
      18845
      354
      6784
      58816
      9890
      {'x': -122.46900000000001, 'y': 37.66920000000...
      3
      True
    
    
      23
      24
      COLMA
      TARGET
      230019408
      45211101
      94654
      531102
      D
      CA
      JUNIPERO SERRA BLVD
      ...
      1336
      3117
      15394
      317
      5119
      46690
      8154
      {'x': -122.46469999999998, 'y': 37.67330000000...
      1
      False
    
    
      24
      25
      SAN FRANCISCO
      TARGET
      715933937
      45211101
      35495
      531102
      D
      CA
      OCEAN AVE
      ...
      371
      1030
      6426
      84
      5976
      36031
      4408
      {'x': -122.46109999999996, 'y': 37.72560000000...
      1
      False
    
    
      25
      26
      SAN FRANCISCO
      TARGET
      433910516
      45211101
      35495
      531102
      D
      CA
      GEARY BLVD
      ...
      380
      795
      3525
      114
      3686
      85223
      6066
      {'x': -122.44589999999998, 'y': 37.78230000000...
      1
      False
    
    
      26
      27
      SAN FRANCISCO
      TARGET
      424165589
      45211101
      1893
      531102
      C
      CA
      MISSION ST
      ...
      391
      665
      3115
      35
      2083
      52153
      5903
      {'x': -122.40399999999993, 'y': 37.78470000000...
      4
      False
    
    
      27
      28
      SAN FRANCISCO
      TARGET
      707846495
      45211101
      35495
      531102
      D
      CA
      BUSH ST
      ...
      322
      536
      2572
      54
      2079
      55605
      7392
      {'x': -122.4016, 'y': 37.790900000000065}
      2
      False
    
    
      28
      29
      ALBANY
      TARGET
      403717441
      45211101
      35495
      531102
      D
      CA
      EASTSHORE HWY
      ...
      753
      1219
      3456
      94
      1784
      22257
      1328
      {'x': -122.30689999999998, 'y': 37.88360000000...
      4
      False
    
    
      29
      30
      ALAMEDA
      TARGET
      434528941
      45211101
      35495
      531102
      D
      CA
      5TH ST
      ...
      636
      759
      1804
      23
      852
      13043
      2001
      {'x': -122.28149999999995, 'y': 37.78720000000...
      0
      False
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1778
      1779
      WATERTOWN
      TARGET
      404476340
      45211101
      85608
      531102
      D
      MA
      ARSENAL ST
      ...
      1494
      1833
      4279
      68
      996
      47510
      2297
      {'x': -71.15759999999993, 'y': 42.36410000000006}
      4
      False
    
    
      1779
      1780
      BOSTON
      TARGET
      439199889
      45211101
      32103
      531102
      D
      MA
      BOYLSTON ST
      ...
      1488
      1716
      3799
      45
      2029
      97063
      3458
      {'x': -71.09969999999996, 'y': 42.344400000000...
      2
      False
    
    
      1780
      1781
      DORCHESTER
      TARGET
      404478002
      45211101
      32103
      531102
      D
      MA
      ALLSTATE RD
      ...
      2312
      1978
      2695
      219
      582
      37260
      3366
      {'x': -71.06379999999999, 'y': 42.32540000000007}
      0
      False
    
    
      1781
      1782
      SOMERVILLE
      TARGET
      404467681
      45211101
      42804
      531102
      D
      MA
      SOMERVILLE AVE
      ...
      2909
      3249
      5113
      57
      1221
      74269
      3633
      {'x': -71.08999999999995, 'y': 42.37760000000006}
      4
      False
    
    
      1782
      1783
      EVERETT
      TARGET
      580926004
      45211101
      44944
      531102
      D
      MA
      MYSTIC VIEW RD
      ...
      5182
      3673
      3085
      142
      291
      67317
      4977
      {'x': -71.07049999999995, 'y': 42.40160000000008}
      4
      False
    
    
      1783
      1784
      REVERE
      TARGET
      404478004
      45211101
      2140
      531102
      D
      MA
      FURLONG DR
      ...
      3699
      1516
      957
      208
      171
      56423
      3471
      {'x': -71.00139999999996, 'y': 42.40070000000002}
      0
      False
    
    
      1784
      1785
      KINGSTON
      TARGET
      233726918
      45211101
      32103
      531102
      D
      MA
      KINGSTON COLLECTION WAY
      ...
      1467
      927
      910
      79
      246
      20030
      1091
      {'x': -70.70319999999998, 'y': 41.97980000000013}
      0
      False
    
    
      1785
      1786
      LEOMINSTER
      TARGET
      404460140
      45211101
      1070
      531102
      C
      MA
      ORCHARD HILL PARK DR
      ...
      769
      307
      193
      146
      51
      13257
      699
      {'x': -71.71219999999991, 'y': 42.52810000000007}
      0
      False
    
    
      1786
      1787
      LOWELL
      TARGET
      716737994
      45211101
      32103
      531102
      D
      MA
      PLAIN ST
      ...
      3995
      1768
      1477
      393
      129
      55463
      3993
      {'x': -71.31809999999997, 'y': 42.62450000000012}
      4
      False
    
    
      1787
      1788
      NASHUA
      TARGET
      404476441
      45211101
      45133
      531102
      D
      NH
      AMHERST ST
      ...
      1158
      549
      342
      150
      74
      14518
      519
      {'x': -71.53639999999992, 'y': 42.80030000000008}
      0
      False
    
    
      1788
      1789
      NASHUA
      TARGET
      404271499
      45211101
      56417
      531102
      D
      NH
      DANIEL WEBSTER HWY
      ...
      992
      647
      394
      133
      24
      14033
      913
      {'x': -71.44069999999991, 'y': 42.70300000000007}
      0
      False
    
    
      1789
      1790
      BEDFORD
      TARGET
      404476444
      45211101
      41748
      531102
      D
      NH
      S RIVER RD
      ...
      1427
      662
      286
      167
      59
      26882
      1707
      {'x': -71.47189999999995, 'y': 42.942600000000...
      4
      False
    
    
      1790
      1791
      HOOKSETT
      TARGET
      425836948
      45211101
      33850
      531102
      D
      NH
      QUALITY DR
      ...
      1696
      494
      327
      233
      53
      34186
      1632
      {'x': -71.46089999999995, 'y': 43.0458000000001}
      4
      False
    
    
      1791
      1792
      CONCORD
      TARGET
      404478085
      45211101
      45133
      531102
      D
      NH
      DAMANTE DR
      ...
      269
      68
      38
      70
      30
      8776
      751
      {'x': -71.48459999999992, 'y': 43.220600000000...
      0
      False
    
    
      1792
      1793
      WOBURN
      TARGET
      404472244
      45211101
      60995
      531102
      D
      MA
      COMMERCE WAY
      ...
      3698
      3804
      3700
      42
      385
      42909
      2653
      {'x': -71.1362999999999, 'y': 42.51560000000017}
      2
      False
    
    
      1793
      1794
      SAUGUS
      TARGET
      434099121
      45211101
      32103
      531102
      D
      MA
      LYNN FELLS PKWY
      ...
      5251
      4116
      4249
      148
      817
      58103
      3815
      {'x': -71.02409999999995, 'y': 42.48030000000009}
      1
      False
    
    
      1794
      1795
      WILMINGTON
      TARGET
      706932486
      45211101
      32103
      531102
      D
      MA
      BALLARDVALE ST
      ...
      2125
      2155
      3063
      28
      633
      28077
      1358
      {'x': -71.15959999999995, 'y': 42.590600000000...
      4
      False
    
    
      1795
      1796
      SALEM
      TARGET
      232311613
      45211101
      32103
      531102
      D
      MA
      HIGHLAND AVE
      ...
      2939
      979
      533
      47
      94
      27473
      2102
      {'x': -70.91999999999993, 'y': 42.50370000000005}
      0
      False
    
    
      1796
      1797
      DANVERS
      TARGET
      572661924
      45211101
      21402
      531102
      D
      MA
      INDEPENDENCE WAY
      ...
      3352
      1955
      1519
      57
      140
      32110
      3059
      {'x': -70.93589999999995, 'y': 42.553900000000...
      4
      False
    
    
      1797
      1798
      SALEM
      TARGET
      404476440
      45211101
      45359
      531102
      D
      NH
      S BROADWAY
      ...
      2396
      1068
      747
      280
      95
      25617
      1460
      {'x': -71.21489999999994, 'y': 42.76440000000014}
      4
      False
    
    
      1798
      1799
      METHUEN
      TARGET
      653372177
      45211101
      64206
      531102
      D
      MA
      PLEASANT VALLEY ST
      ...
      2810
      1038
      835
      276
      63
      37046
      2428
      {'x': -71.16079999999991, 'y': 42.74080000000003}
      4
      False
    
    
      1799
      1800
      HAVERHILL
      TARGET
      647061712
      45211101
      32103
      531102
      D
      MA
      COMPUTER DR
      ...
      3228
      1164
      792
      208
      38
      44862
      2623
      {'x': -71.11899999999999, 'y': 42.78640000000011}
      4
      False
    
    
      1800
      1801
      GREENLAND
      TARGET
      716991245
      45211101
      33850
      531102
      D
      NH
      GREENLAND RD
      ...
      1259
      855
      691
      110
      305
      17278
      792
      {'x': -70.81869999999998, 'y': 43.04930000000009}
      0
      False
    
    
      1801
      1802
      SOMERSWORTH
      TARGET
      404478087
      45211101
      67700
      531102
      D
      NH
      ANDREWS RD
      ...
      750
      403
      68
      171
      52
      22790
      1053
      {'x': -70.88269999999991, 'y': 43.234700000000...
      0
      False
    
    
      1802
      1803
      PLATTSBURGH
      TARGET
      719443137
      45211101
      33924
      531102
      D
      NY
      SMITHFIELD BLVD
      ...
      116
      60
      95
      552
      23
      15530
      916
      {'x': -73.49119999999994, 'y': 44.70240000000007}
      0
      False
    
    
      1803
      1804
      BIDDEFORD
      TARGET
      402052516
      45211101
      32229
      531102
      D
      ME
      MARINER WAY
      ...
      240
      53
      41
      28
      19
      5589
      244
      {'x': -70.5133999999999, 'y': 43.473100000000116}
      0
      False
    
    
      1804
      1805
      SOUTH PORTLAND
      TARGET
      404465444
      45211101
      55864
      531102
      D
      ME
      RUNNING HILL RD
      ...
      605
      450
      189
      96
      22
      14518
      719
      {'x': -70.34589999999994, 'y': 43.63580000000009}
      0
      False
    
    
      1805
      1806
      TOPSHAM
      TARGET
      404459522
      45211101
      32229
      531102
      D
      ME
      TOPSHAM FAIR MALL RD
      ...
      252
      124
      56
      63
      60
      8951
      619
      {'x': -69.98529999999995, 'y': 43.93390000000004}
      0
      False
    
    
      1806
      1807
      AUGUSTA
      TARGET
      404472262
      45211101
      32229
      531102
      D
      ME
      CROSSING WAY
      ...
      45
      34
      32
      276
      21
      8544
      685
      {'x': -69.80609999999996, 'y': 44.30930000000007}
      0
      False
    
    
      1807
      1808
      BANGOR
      TARGET
      404476359
      45211101
      32229
      531102
      D
      ME
      LONGVIEW DR
      ...
      133
      79
      52
      204
      15
      9242
      604
      {'x': -68.74109999999993, 'y': 44.84110000000006}
      0
      False
    
  

1808 rows × 163 columns



In [24]:

    
df.groupby("segment_3").CITY.value_counts()









    Out[24]:





segment_3  CITY           
False      CHICAGO            14
           SAN ANTONIO        14
           HOUSTON            13
           LAS VEGAS          10
           AUSTIN              9
           CHARLOTTE           8
           COLUMBUS            8
           JACKSONVILLE        8
           KANSAS CITY         8
           MINNEAPOLIS         8
           PHOENIX             8
           SAN JOSE            8
           ST PAUL             8
           ATLANTA             7
           LOUISVILLE          7
           MIAMI               7
           PITTSBURGH          7
           SAN DIEGO           7
           SPRINGFIELD         7
           TUCSON              7
           AURORA              6
           CINCINNATI          6
           MILWAUKEE           6
           OMAHA               6
           ORLANDO             6
           PHILADELPHIA        6
           ROCHESTER           6
           SACRAMENTO          6
           TAMPA               6
           ALBUQUERQUE         5
                              ..
True       CHANDLER            1
           CHICAGO             1
           COMMACK             1
           CUPERTINO           1
           DALY CITY           1
           ENCINITAS           1
           FALLS CHURCH        1
           GLENVIEW            1
           GRANADA HILLS       1
           HACKENSACK          1
           HICKSVILLE          1
           HIGHLANDS RANCH     1
           LONG BEACH          1
           MANHATTAN BEACH     1
           MOUNTAIN VIEW       1
           NILES               1
           PARAMUS             1
           PEORIA              1
           PHOENIX             1
           POWAY               1
           ROCKVILLE           1
           SAN DIMAS           1
           SEAL BEACH          1
           ST LOUIS            1
           ST LOUIS PARK       1
           SUNNYVALE           1
           VAUXHALL            1
           WALNUT CREEK        1
           WILLOWBROOK         1
           WOODRIDGE           1
Name: CITY, Length: 1230, dtype: int64

Map Results

After doing this analysis, we are going to begin by showing the locations of the top performing stores on the map relative to the rest of the stores.



In [17]:

    
top_segment = df_summary.iloc[0].segment
top_segment









    Out[17]:





3.0

This is a first stab at mapping the data just using the draw method, which plots the data on the map much like just using a sharpie. The data is not organized into a layer per say, so it is not very useful. Still, it is interesting at a coursory level, just seeing the rest of the stores, and the top performers.



In [26]:

    
store_map = gis.map('Meades Ranch, KS', 4)
store_map.basemap = 'gray'
store_map



In [19]:

    
for index, row in df[df.segment != top_segment].iterrows():
    store_map.draw(row.SHAPE, symbol={'type': 'esriSMS', 'style': 'esriSMSCircle', 'size': 4})
for index, row in df[df.segment == top_segment].iterrows():
    store_map.draw(row.SHAPE, symbol={'type': 'esriSMS', 'style': 'esriSMSCircle', "color":[0,255,0,255], "size":6})

PCA



In [18]:

    
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt



In [41]:

    
pca = PCA(n_components=2)



In [50]:

    
df['x'] = pca.fit_transform(df[fields_independent])[:,0]
df['y'] = pca.fit_transform(df[fields_independent])[:,1]



In [61]:

    
ax = plt.scatter(data=df, x='x', y='y', c='segment')

cluster_centers = pca.transform(cluster.cluster_centers_)
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
cluster_centers['segment'] = range(0, len(cluster_centers))

plt.plot(data=cluster_centers)

plt.show()









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-61-6b3337d6cc3b> in <module>()
      1 ax = plt.scatter(data=df, x='x', y='y', c='segment')
      2 
----> 3 cluster_centers = pca.transform(cluster.cluster_centers_)
      4 cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
      5 cluster_centers['segment'] = range(0, len(cluster_centers))

NameError: name 'cluster' is not defined



In [ ]:

	NAME	ALIAS
0	OBJECTID	Object ID
1	Target_Locations_LOCNUM	LOCNUM
2	Target_Locations_CONAME	CONAME
3	Target_Locations_STREET	STREET
4	Target_Locations_CITY	CITY
5	Target_Locations_STATE	STATE
6	Target_Locations_ZIP	ZIP
7	Target_Locations_ZIP4	ZIP4
8	Target_Locations_NAICS	NAICS
9	Target_Locations_SIC	SIC
10	Target_Locations_SALESVOL	SALESVOL
11	Target_Locations_SQFTCODE	SQFTCODE
12	overlayTemplate_TOTPOP_CY	2016 Total Population
13	overlayTemplate_HHPOP_CY	2016 Population in Households
14	overlayTemplate_FAMPOP_CY	2016 Population in Families
15	overlayTemplate_GQPOP_CY	2016 Population in Group Quarters
16	overlayTemplate_POPDENS_CY	2016 Population Density
17	overlayTemplate_TOTHH_CY	2016 Total Households
18	overlayTemplate_AVGHHSZ_CY	2016 Average Household Size
19	overlayTemplate_FAMHH_CY	2016 Family Households
20	overlayTemplate_AVGFMSZ_CY	2016 Average Family Size
21	overlayTemplate_TOTHU_CY	2016 Total Housing Units
22	overlayTemplate_OWNER_CY	2016 Owner Occupied HUs
23	overlayTemplate_RENTER_CY	2016 Renter Occupied HUs
24	overlayTemplate_VACANT_CY	2016 Vacant Housing Units
25	overlayTemplate_POPGRW10CY	2010-2016 Growth Rate: Population
26	overlayTemplate_HHGRW10CY	2010-2016 Growth Rate: Households
27	overlayTemplate_FAMGRW10CY	2010-2016 Growth Rate: Families
28	overlayTemplate_NOHS_CY	2016 Education: < 9th Grade
29	overlayTemplate_SOMEHS_CY	2016 Education: High School/No Diploma
...	...	...
130	overlayTemplate_INDEDUC_CY	2016 Industry: Educational Services
131	overlayTemplate_INDHLTH_CY	2016 Industry: Health Care
132	overlayTemplate_INDARTS_CY	2016 Industry: Arts/Entertainment/Rec
133	overlayTemplate_INDFOOD_CY	2016 Industry: Accommodation/Food Svcs
134	overlayTemplate_INDOTSV_CY	2016 Industry: Other Services
135	overlayTemplate_INDPUBL_CY	2016 Industry: Public Administration
136	overlayTemplate_UNEMP_CY	2016 Unemployed Population 16+
137	overlayTemplate_UNEMPRT_CY	2016 Unemployment Rate
138	overlayTemplate_OCCMGMT_CY	2016 Occupation: Management
139	overlayTemplate_OCCBUS_CY	2016 Occupation: Business/Financial
140	overlayTemplate_OCCCOMP_CY	2016 Occupation: Computer/Mathmatical
141	overlayTemplate_OCCARCH_CY	2016 Occupation: Architecture/Engineer
142	overlayTemplate_OCCSSCI_CY	2016 Occupation: Life/Social Sciences
143	overlayTemplate_OCCSSRV_CY	2016 Occupation: Social Service
144	overlayTemplate_OCCLEGL_CY	2016 Occupation: Legal
145	overlayTemplate_OCCEDUC_CY	2016 Occupation: Education/Library
146	overlayTemplate_OCCENT_CY	2016 Occupation: Arts/Entertainment/Rec
147	overlayTemplate_OCCHTCH_CY	2016 Occupation: Health Practices
148	overlayTemplate_OCCHLTH_CY	2016 Occupation: Health Support
149	overlayTemplate_OCCPROT_CY	2016 Occupation: Protective Service
150	overlayTemplate_OCCFOOD_CY	2016 Occupation: Food Preperation
151	overlayTemplate_OCCBLDG_CY	2016 Occupation: Building Maintenance
152	overlayTemplate_OCCPERS_CY	2016 Occupation: Personal Care
153	overlayTemplate_OCCSALE_CY	2016 Occupation: Sales
154	overlayTemplate_OCCADMN_CY	2016 Occupation: Office/Admin
155	overlayTemplate_OCCFARM_CY	2016 Occupation: Farm/Fish/Forestry
156	overlayTemplate_OCCCONS_CY	2016 Occupation: Construction/Extraction
157	overlayTemplate_OCCMAIN_CY	2016 Occupation: Maintenance/Repair
158	overlayTemplate_OCCPROD_CY	2016 Occupation: Production
159	overlayTemplate_OCCTRAN_CY	2016 Occupation: Transportation/Moving

	OBJECTID	Target_Locations_CITY	Target_Locations_CONAME	Target_Locations_LOCNUM	Target_Locations_NAICS	Target_Locations_SALESVOL	Target_Locations_SIC	Target_Locations_SQFTCODE	Target_Locations_STATE	Target_Locations_STREET	...	overlayTemplate_VAL200K_CY	overlayTemplate_VAL250K_CY	overlayTemplate_VAL300K_CY	overlayTemplate_VAL400K_CY	overlayTemplate_VAL500K_CY	overlayTemplate_VAL50K_CY	overlayTemplate_VAL750K_CY	overlayTemplate_WHITE_CY	overlayTemplate_WIDOWED_CY	SHAPE
0	1	SEASIDE	TARGET	666990510	45211101	35495	531102	D	CA	CALIFORNIA AVE	...	200	354	934	693	1976	81	1012	23576	1902	{'x': -121.84299999999993, 'y': 36.62100000000...
1	2	MARINA	TARGET	653371815	45211101	35495	531102	D	CA	GENERAL STILLWELL DR	...	137	270	1043	742	1612	81	588	21068	1500	{'x': -121.81120000000001, 'y': 36.66760000000...
2	3	CAPITOLA	TARGET	423468472	45211101	35495	531102	D	CA	41ST AVE	...	372	405	1015	1271	4018	261	2452	41050	2524	{'x': -121.96509999999991, 'y': 36.9753000000001}
3	4	WATSONVILLE	TARGET	511743478	45211101	35495	531102	D	CA	MAIN ST	...	456	449	870	982	1211	197	293	19905	1323	{'x': -121.77399999999992, 'y': 36.91540000000...
4	5	CUPERTINO	TARGET	404459478	45211101	52059	531102	D	CA	STEVENS CREEK BLVD	...	43	49	211	519	2823	110	7694	41689	5226	{'x': -122.03619999999995, 'y': 37.32310000000...

	segment	count	mean_sales	median_sales	mean_sales_zscore
4	3	52	45732.500000	42130.5	0.591792
3	1	208	40657.115385	35495.0	0.308045
2	2	360	38065.166667	35495.0	0.163139
1	4	536	34590.748134	33751.0	-0.031103
0	0	652	31391.226994	32268.0	-0.209977